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On the entropy of a noisy function 

Alex Samorodnitsky* 


Abstract 


Let 0 < e < 1/2 be a noise parameter, and let T £ be the noise operator acting on functions 
on the boolean cube {0, l} n . Let / be a nonnegative function on {0,1}". We upper bound 
the entropy of T e f by the average entropy of conditional expectations of /, given sets of 
roughly (1 — 2e) 2 • n variables. 

In information-theoretic terms, we prove the following strengthening of ’’Mrs. Gerber’s 
lemma”: Let A be a random binary vector of length n , and let Z be a noise vector, 
corresponding to a binary symmetric channel with crossover probability e. Then, setting 
v = (1 — 2e) 2 • n, we have (up to lower-order terms): 



> 


n ■ H 2 


^ + (1 - 2e) • H- 1 



Assuming e > 1/2 — 5, for some absolute constant S > 0, this inequality, combined with a 
strong version of a theorem of Friedgut, Kalai, and Naor, due to Jendrej, Oleszkiewicz, and 
Wojtaszczyk, shows that if a boolean function / is close to a characteristic function g of a 
subcube of dimension n — 1, then the entropy of T e f is at most that of T t g. 

Taken together with a recent result of Ordentlich, Shayevitz, and Weinstein, this shows that 
the ’’Most informative boolean function” conjecture of Courtade and Kumar holds for high 
noise e > 1/2 — 8. 

Namely, if A is uniformly distributed in {0,1}" and Y is obtained by flipping each coordinate 
of X independently with probability e, then, provided e > 1/2 — 6, for any boolean function 
/ holds / (/(A); Y) < 1 — U(e). 


1 Introduction 

This paper is motivated by the following conjecture of Courtade and Kumar [7]. 

Let (A ,Y) be jointly distributed in {0,l} n such that their marginals are uniform and Y is 
obtained by flipping each coordinate of X independently with probability e. Let H 2 denote the 
binary entropy function H 2 (x) = — xlog 2 x — (1 — x) log 2 (l — x). The conjecture of [7] is: 

Conjecture 1.1: For all boolean functions / : (0, l} n — > {0,1}, 

/(/(A);k) < 1 - H 2 (e) 
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This inequality holds with equality if / is a characteristic function of a subcube of dimension 
n—1. Hence, the conjecture is that such functions are the ’’most informative” boolean functions. 

Following j9j, we express ) in terms of the ’value of the entropy functional of the 

image of / under the noise operator’ (all notions will be defined shortly). The question then 
becomes: 

Which boolean functions are the ’’stablest’’ under the action of the noise operator? That is, 
for which functions the entropy functional decreases the least under noise. 

One can also consider a more general question of how the noise operator affects the entropy of 
a nonnegative function. 

Our main result is that for a nonnegative function / on {0,1}”, the entropy of the image of / 
under the noise operator with noise parameter e is upper bounded by the average entropy of 
conditional expectations of /, given sets of roughly (1 — 2e) 2 • n variables. 

As an application, using the recent strengthening [6] of a theorem of |4j, we show that for e 
close to 1/2 characteristic functions of (n — l)-dimensional subcubes are at least as stable under 
the noise operator as functions which are close to them. 

This, in conjunction with [3] and a recent result of |T2] which can be used to show that, for high 
noise levels e ~ 1/2, boolean functions, which are potentially as stable as the characteristic 
functions of (n — l)-dimensional subcubes, have to be close to these functions, implies the 
validity of Conjecture 11.11 for high noise levels. 

1.1 Entropy of nonnegative functions and the noise operator 

We introduce some relevant notions. 

For a nonnegative function / : {0,1}” — > R, we let the entropy of / to be defined as 
Ent^j = E/ (x) log 2 / (x) - E/(x) • log 2 (e/(x)) 

We note for future use that entropy is nonnegative, homogeneous Ent ^A f^j = A • Ent (^f j and 
convex in / [8j. 

Given 0 < e < 1/2, we define the noise operator acting on functions on the boolean cube as 
follows: for / : {0, l} n —> R, we let T e f at a point x be the expected value of / at y, where y 
is e-correlated with x. That is, 

(TJ)(x) = ]T (\ y ~ x \ ■ (i — e ^ n -\y ~ x \. j(y) (i) 

J/S{0,l} n 

Here | • | denotes the Hamming distance. 

Note that T e f is a convex combination of shifted copies of /. Hence, convexity of entropy 
implies that the noise operator decreases entropy. Our goal is to quantify this statement. 
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1.1.1 Connection between notions 


Let / be a nonnegative function on {0,1}”. Let X be a random variable on {0,1}” distributed 
according to Let Z be an independent noise random variable on {0,l} n . That is, 

Pr{Z = z} = el z l • (1 — e) n- l 1 2 l, and X and Z are statistically independent. Then 

. Ent(f) = E f-(n-H{X)) 

• Entfrj^j = E f-(n - H(x®Z^ 

Let now / : {0, l} n —> {0,1} be a boolean function, let X be uniformly distributed in {0, l} n , 
let Z be an independent noise random variable, and let Y = X © Z. Then 

# (/(*)) = £«*(/) + Ent(l-f) 

We also have the following simple claim (proved in Section [6] below) 

Lemma 1.2: For a boolean function f : {0, l} n —> {0,1}, 

/(/(X) ; y) = Ent(r e f ) + Ent(r € {l-fj) 

Therefore, Conjecture 11.11 translates as follows: 

Conjecture 1.3: (An equivalent form of Coniecture ll.il) 

For any boolean function / : {0, l} n —> {0,1} holds 

Entfrj^j + Ent{T e (l-f)) < 1 - H 2 (e) 


I 

1.2 Mrs. Gerber’s function and Mrs. Gerber’s lemma 

We describe a result from information theory, and a related function, which will be important 
for us 0. 

Let ft be a function on the two-point space {0,1}, which is t at zero and 2 — t at one. We have 

En i f ‘) = 1 - 

1 We are grateful to V. Chandar [3] for explaining the relevance of this result in connection to our previous 

work [18] on the subject. 
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Let cj>(x, e) be a function on [0,1] x [0,1/2] defined as follows: 

4>{x,e) = Ent(r e f^j (2) 

where t is chosen so that Ent (ft) = x. 

This function was introduced in [21J . We will now describe some of its properties. 

Note that is increasing in x, starting from zero at x = 0. 

In fact, it is easy to derive the following explicit expression for (/>: 

4>(x,e) = 1 — H 2 [(l - 2e) • i?^' 1 (l — x) + e) 

A key property of </> is its concavity. 

Theorem 1.4: The function <f(x,e) is concave in x for any 0 < e < 1/2. 

We mention a simple corollary. 


Corollary 1.5: For all 0 < e < 1/2, 

^1 — H 2 (efj ■ x < <j>(x,e) < (1 — 2e) 2 • x 


( 3 ) 


Proof: It’s easy to check 4>(0,e) = 0 and 0(1, e) = 1 — H 2 (e). And, it’s easy to check that ^ 
at x = 0 is (1 — 2e) 2 . | 

From now on, when the value of e is clear from the context, we omit the second parameter in 
cj) and write <f(x) instead of <j)(x,e). 

We now describe an inequality of |2T], which is known as Mrs. Gerber’s lemma. Following this 
usage, we will refer to the function cf as Mrs. Gerber’s function. 


This inequality upperbounds the entropy of the image of a nonnegative function under the 
action of the noise operator. We present it in terms of the entropy functional and the noise 
operatoid. 


Theorem 1.6: (}2Vf ) Let f be a nonnegative function on {0, l} n . Then 

Ent(Tj) < »E e ) (4) 

2 As pointed out to us by Chandar [3], this is equivalent to the standard information-theoretic formu¬ 
lation: Let A' be a random binary vector of length n distributed according to f/fff, and let Z be a 
noise vector, corresponding to a binary symmetric channel with crossover probability e. Then H ( X © Z) > 
ntf 2 (e+(l —2e)-U 2 - 1 (^)). 
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1.3 Main results 


For A C [n] and for a nonnegative function / : {0,1}” —> M, we denote 


E 



= E 


(/ | {xi} ieA 


Here E is the conditional expectation operator. That is, E yf | AJ is the function of the 
variables {xi\ i&A , defined as the expectation of / given the values of {xi }0 
We write 


Ent 


(' 


= Ent(m{f\A 


To connect notions, observe that if X is a random variable on {0, l} n distributed according to 
f / /> then the distribution of {Xi}i GA on the |A|-dimensional cube is given by 9 | A | L E j -E (f\A) 

and that 

Ent(f\A) = E/- (\A\ - (5) 

Our main claim is that the entropy of a nonnegative function / under noise is upper bounded by 
the average entropy of conditional expectations of /, given certain random subsets of variables. 
We present several results which illustrate this fact. 


Theorem 1.7: Let f be a nonnegative function on the cube with E/ = 1. 

Let 0 < e < 1 be a noise parameter. Let T be a random subset of [n] generated by sampling 
each element i £ \n] independently with probability (1 — 2e) 2 . Then 


Ent l T, 


< 


E 

T 


Ent 


T ) “ ^Entf^f | {*})') + W) 


i&T 


i= 1 


Remark 1.8: We are grateful to O. Ordentlich for suggesting this formulation for the claim of 
this theorem, as well as for Theorem II.121 below (in earlier versions the average on the RHS was 
taken over sets of a fixed cardinality ~ (1 — 2e) 2 -n, which led to more cumbersome calculations.) 

Let us also mention that Polyanskiy and Wu m came up with a new and direct proof of the 
key claim, Proposition 14.11 which does not rely on linear programming, and this was used by 
Ordentlich [12] to give direct proofs for Theorems II.71 and 11.121 | 


Applying the inequality (f(x,e) < (1 — 2e) 2 ■ x (see Q) to the claim of the theorem, gives the 
following, more streamlined claim. (However, the somewhat stronger claim of the theorem is 
needed for the applications.) 

3 We also may (and will) view E (/ | A) as a function on {0, l} n , which depends only on variables with indices 
in A. 
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Corollary 1.9: In the notation of Theorem [13 
Ent(T e f) < E Ent(f \ T) 

Specializing to boolean functions, this implies the following claim. 


Corollary 1.10: In the notation of Conjecture \1.1\ and of Theorem [13 for a boolean function 
f : {0, l} n ->• {0,1} holds 

l(/(X);Y) < E /(/(X); {X ? :} ;:er ) 


Remark 1.11: Let B be a random subset of [n] generated by sampling each element i G [n] 
independently with probability 1 — 2e. 

As pointed out by Or Ordentlich m, it seems instructive to compare the bound in Corol¬ 
lary [TTT0] to the weaker bound 

j(/(X);Y) < E /(/(X); {X^s) 


which can be obtained by the following information-theoretic argument. 

An equivalent way to obtain Y from X is to replace each coordinate of X independently with 
a random bit, with probability 2e . 

Let S be the set of indices where the input bits were replaced with random bits, and let B = S c . 
Using the chain rule of mutual information we have 

j(/(X);Y) = l(/(X);Y,s) - /(/(X);5jy) = /(/(X); Y \ s) - /(/(X); S \ Y 


where the last equality follows since /^/(X); s'j = 0. 
In particular, by non-negativity of mutual information 


j(/(X);y) < /(/(X);Y|S) = E s /(/(X);{Xj iei3 ) 
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We also show a somewhat different strengthening of Corollary 1 1.91 which gives a stronger version 
of Mrs. Gerber’s lemma (Theorem II. 6|) . 

Theorem 1.12: In the notation of Theorem E3 setting t = (1 — 2e) 2 • n, the following is true: 
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In the standard information-theoretic notation, this could be restated as follows. Let X be a 
random binary vector of length n, and let Z be an independent noise vector, corresponding to 
a binary symmetric channel with crossover probability e. Then 


H 


[x®Z^ > n-H 2 


e + (1 — 2e) ■ H, 


-1 

2 


E tH^X^t) 



( 6 ) 


t 


We refer to m for an application of (|6|). 


Remark 1.13: 

Up to a negligible error term, the claim of the theorem is stronger than that of Theorem 11.61 
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We now return to Conjectures 11.11 and 11.31 

Let us first describe a family of functions for which these conjectures are known to hold with 
equality. Let 1 < k < n be an index, and let gk{x ) = 1 if and only if Xk = 0. (That is, gk is a 
characteristic function of the (n — l)-dimensional subcube {xk = 0}.) 

It is easy to verify that Ent(T € gk ) = \ • (1 — H 2 (e)) and Ent (T e gk ) + Ent (T e (1 — gk)) = 
1 - H 2 (e). 

We apply Theorem 11.71 to show that, for e ~ 1/2, the conjectures also hold for functions which 
are close to characteristic functions of subcubes. 

To make the notion of proximity more precise, recall (see DU ) that any function / : {0, l} n —> R 
can be expanded in terms of the Walsh-Fourier basis: f(x) = ^sc[nl f(S) ' Ws(x). Here 
W s (x) = (-l)£ ie s*\ 

The Walsh-Fourier expansion of gk is especially simple: g~k( 0) = E gk = 1/2, <?/({&}) = 1/2, 
and gk(S) = 0 for all other S C [n]. 

It follows from [6j and [4] that a boolean function whose Walsh-Fourier expansion is close to 
that of gk, in that it has a large (i.e., close to 1/2) Fourier coefficient at {A;}, has to be very 
close, in the appropriate sense, to gk- 

The next claim shows the conjectures to hold for such functions. 

Theorem 1.14: There exists an absolute constant 5 > 0 such that for any noise e > 0 with 
(1 — 2e) 2 < 6 and for any boolean function f : {0, l} n —> {0,1} such that 


• <E/< I; 

• There exists 1 < k < n such that !/({&})| > (1 — 5) ■ E/ 


Holds 
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1. 


Ent(Tj) < i-(l -H 2 (ej) 

2. Ent(T e f ) + Ent(T e (l-fj) < 1 - H 2 (e) 


This, in conjunction with [3] and m, which can be used to show that, for noise parameter 
close to 1/2, boolean functions, which are potentially as stable as the characteristic functions 
of (n — l)-dimensional subcubes, have to satisfy the constraints of Theorem 11.141 implies the 
validity of Conjecture 11.11 for high noise levels. 


Theorem 1.15: There exists an absolute constant 5 > 0 such that for any noise e > 0 with 
(1 — 2e) 2 < 5 and for any boolean function f : {0, l} n —> {0,1} holds 

l(f(Xy,Y^j < 1 - H 2 (e) 


1.4 More on Theorems 11.71 and [TTT21 

In this subsection we give a high-level description of the proofs of these theorems and argue 
that both their claims may be viewed as strengthenings of Mrs. Gerber’s lemma. 

Notation: For a direction 1 < i < n we define the noise operator in direction i as follows: 
(r e{i} /) (x) = e-f(x + e^ + (1 — e) • / (x) 

where Ci is the i th unit vector. The operators commute and, for R C [n], we define T tR 

to be the composition of T t{ , i £ R. Note that the noise operator T e would be written in this 
notation as T e , .. 

We start with the proof of Mrs. Gerber’s lemma ([5]). Since both sides of the inequality are 
homogeneous in /, we may assume E / = 1. 

By the chain rule for entropy, for any permutation o in the symmetric group S n holds 


Ent(r e f^j = 'jr (Ent(TJ \ (cr(l),..., a(i)} ) - Ent(r e f \ |cr(l),..., cr(i - 1)} ) 


1=1 


ri / 

^ I Ent(T f{a(1) . ff( . )} / | {<7(l),...,a(*)}) - Ent(T t 


e {tT(l),...,cr(i-l)} 


1=1 


f I (cr(l),... ,cr(* — 1)}) 


n / 

ZM Ent ( T ^ CT (i),..., CT(i -i)}/ I { CJ ( 1 )’ ■ ■ ■ > ) - Ent (Te {aW .„ (i _i )} / | Ml), - - - ,<t(* — 1)}) 


1=1 
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Let us explain the last inequality. Let y £ {0,1}* . Let f y be a function on {0,1} defined 

by the restriction of the function E | {cr(l),..., <r(*)}^ , which we view as a 

function on the i-dimensional cube, to the points in which the coordinates cr(k), k = 1, ...,i — 1 
are set to be y k . Then, it is easy to see that 

Ent(T e{rrW ... >CT( i)}/ | (cr(l),..., cr(*)> ) - Ent(T e{tr(1) ... ><r(i)} / | Ml),..., a{i - 1)} ) = 


¥< EntyTtfy 


= E E f y -(f>\ Ent 


Jy_ 

E fy 


^ < (f> ^ E Ent 


(/)^Ent(T e{trW | M1),...,ct(*)}) - Ent(r e{am Mi _ 1)} f | (a(l),... ,a(i - 1)})^ 

The hrst equality in the second row follows from Q and the linearity of entropy. The inequality 
follows from concavity of the function 4> and the fact that E^ E f y = E ^ T e{a{1) CT(i)} / | {<7(1), • • • , cr (i)} 

= E/ = L 

We now continue from ([7]). 

For y £ {0, l}* -1 , let f y be a function on {0,1} defined by the restriction of the function 
E (/ | {<t( 1), ■ • • ,<r(i)}^ to the points in which the coordinates cr(k), k = 1, ...,i — 1 are set to 
be y k . 

Since the noise operator T £{(t{1) stochastic, the functions |/y| are a stochastic mixture 

of the functions |/y|- Hence, since the Ent functional is convex, for any 0 < e < 1 holds 

I MM-M*)}) - I = 


E Entyfyj < E Ent[ Jy 


( 8 ) 


Ent(j \ Ml),...,cr(*)}) - Ent(j | {cr(l),..., a(i - 1)} ) 


And hence is upper bounded by 


^2<p(Ent(f | Ml),...,cr(*)}^ - Ent(j \ {<T(l),...,ff(i-l)}jj 

i =1 ^ ' 


Ent 


< n 


n 


where in the last inequality the concavity of <f> is used again. 
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1.4.1 Our improvement 


We attempt to quantify the loss in inequality Q. 

Let us introduce some notation. For a nonnegative function g on the cube, for a subset d C [n], 
and for an element m ^ A, we define 


I g (A,m ) = Enti^g | — Ent[g | — Ent(^g \ {m}j 

This quantity is always nonnegative. In fact, let X be distributed on {0,1}" according to 
g/J2g- Assume Eg = 1 and note that in this case, by Subsection 11.1.11 and by (0, we have 
I g (A,m ) = H ({X t } lGA ) + H (X m ) - H = /({W} iej4 ; X m ). 

Coming back to (JH]) , observe that Ent ^T £{ct(1) a( ._ 1)} / | {cr(i)}^ = Ent(^f \ {c(*)}j. 

Hence, taking A = {cr(1),... ,<r(i — 1)} and m = {cr(i)}, the decrease in ([8]) is from If(A,m ) 
to lT tA f(A,m). Therefore, our goal is to quantify the decrease in mutual information in the 
presence of noise. 

In the next two sections we consider a somewhat more general question of upper bounding 
lT eA f(A,m), given /, A, and m. In Section [2] we upper bound lT eA f(A,m) by the value of a 
certain linear program. In Section [3] we introduce a symmetric version of this program and a 
symmetric solution for the symmetric program, and show its value to be at least as large as 
that of the original program. 

We then find the value of the symmetric solution, as a function of /, A, and m. This value 
provides an upper bound on the noisy mutual information (see Proposition 14. II) . 

In order to prove Theorems 11.71 and 11.121 we apply the improved bound in (|8|), averaging the 
chain rule for the entropy of T e f over all permutations a £ S n . 

This improvement in Q is the reason we suggest to view both these claims as stronger versions 
of Mrs. Gerber’s lemma. 

On the other hand, strictly speaking, this line of argument does not necessarily provide a direct 
improvement of (JH), since in the averaging step we have to replace 4>(x,e ) by a larger linear 
function (1 — 2e) 2 • x, in order to be able to come up with manageable estimates. 

In fact, the difference between the two claims stems from the different ways in which we ap¬ 
ply this ’’linearization” of the function </>(x, e) during averaging. The bounds they give are 
incomparable, though Theorem 11.121 is a more evident improvement of Q. 

We note that the two functions <fr(x,e) and (1 — 2e) 2 • x almost coincide for small values of 
x, and, loosely speaking, if the entropy of / is not too large, as is the case, say, for boolean 
functions, all the arguments of (/> should he very close to zero, meaning not much lost in the 
linear approximation. In this case, the bounds in Theorems 1 1. 71 and 1 1.12 1 are very close to that 
in Corollary 11.91 


10 










1.4.2 Related work 


Y. Polyanskiy m has pointed out to us that the related question of upper bounding 1^ f(A, m) 
given If(A,m ) belongs to the area of strong data processing inequalities (SDPI) in information 
theory (see m,m for pertinent results, and, in particular, for a new proof of Proposition l4.ll) . 


Organization of the paper 

This paper is organized as follows. The proof of Theorem 11.71 is given in Sections [2] to [4j 
Theorem 11.141 is proved in Section [5j The remaining proofs are presented in Section [6l 


2 A linear programming bound for noisy mutual information 

In this section we upper bound the noisy mutual information 7j^y(H, m) by the value of a 
certain linear program. 

Let / be a nonnegative function on the cube. Let A be a subset of [n] and let m ^ A. 

Let |A| = k. We will assume, without loss of generality, that A = [k\ and that m = k + 1. 
Notation: From now on, we write A for (1 — 2e) 2 . 


Discussion 

Before going into details, let us give a high-level description of what the linear program attempts 
to capture. For ease of discussion the notation we use here is slightly different from that in the 
definition of the program below (they are the same up to scaling). 

Given a random variable X on {0, l} n distributed according to f/Ylfi consider a func¬ 
tion 7 on the 7-dimensional boolean cube, defined for 5 C [k\ by the mutual information 
7(5) = /({Xdies; X k+1 ). 

For 5 C [k] and for i € 5, let ys.i = 7(5) — 7(5 \ {i}) be the ’’discrete derivative” of 7 at 5 
in direction i. Note that ysy > 0, since this is the mutual information between 7Q and X ^ + \, 
given {Xj}j G 5 \{, : }. We view y as a function on the edges of the cube. Note also that, for any 
5, the value of the summation of y on the edges of any path from 0 to 5 is 7(5). 

For R C [k], applying noise in directions in R to / leads to a new distribution T eR f / Q) 
on {0, l} n . This defines a new random variable X R , a mutual information function I R and 
discrete derivative functions x R { = I R (S) — I R (S \ {7}). (Note that x® = y). 

Observe that noise decreases mutual information, and hence I R < 7. However, the discrete 
derivatives x R do not necessarily decrease. With that, and this is a key fact, by the strong data 
processing inequality [2], noise in direction i decreases the discrete derivative in direction i (i.e., 
the conditional mutual information between X R and X R +l ) by a factor of at least A. 

The variables in the linear program below are the values of the discrete derivates x R , while we 
consider the discrete derivatives y = x ® related to the initial function / to be the boundary 
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data of the program. We note that the noisy mutual information I([k]) = It f / ([fc]. k + 1) is 
a linear combination of the variables, and that the strong data processing inequality provides 
linear local constraints on the variables. 

Finally, we would like to explain the intuition behind the symmetrization procedure in Section [3j 
The fact that for any R and S the value of the summation of x R on the edges of any path from 0 
to S is I R (S) provides a family of ’’symmetric” linear constraints on the variables. This makes 
it natural to look for a symmetric feasible solution to the linear program (symmetrizing the 
boundary data accordingly), one in which x R (S,i ) depends only on |S| and on \Rn S\. 

We were led to expect that this symmetric solution would be an optimal one by the following 
informal speculation. It turns out that the strong data processing inequality x R { < A • 

may be replaced by a stronger inequality x R ^ < (f> (see ([3]))H This turns the program into 

a strictly concave optimization problem, for which optimality of a feasible symmetric solution 
might be anticipated. It might also be hoped for that replacing the concave constraint by a 
linear one would preserve this property, and this is indeed turns out to be true. 

More to the point, it turns out that for the symmetric solution we define, all the inequalities 
x R i < A • (xgY'j hold with equality. 

The resulting argument is straightforward, most of the work going into setting up notation, and 
verifying feasibility of the symmetric solution. The key step, relying on symmetric properties 
of the discrete cube, is made in Lemma 13.61 


Linear program 

Boundary data: For S C [A;] and for i £ S', we write 

ys,i = Ent^f | S U {k + 1}^ — Ent | S \ {?'} U {k + 1}^ — Ent(^f \ s'j + Ent(^f\S\{i 
The numbers {ys,i} are the boundary data for this problem. 

We note that ys,% > 0 for all S and i. In fact, the value of ys,% is proportional to a certain con¬ 
ditional mutual information. To see this, let X be distributed on {0, l} n according to //. 

Assume E/ = 1 and note that, by Subsection II. 1.11 and by ([5]), ys.% is given by 

H {{Xi} ieS \{iMk+i})+H {{Xi}i£ S )~H ({*O ie5u{fc+1} )-tf ({^} ie s\ W ) = I (*i! *fc +1 | {Xjhesw) 

Variables: for R, S C [k] and i £ S. 

The optimization problem: Given the boundary data, we want to upper bound /r, where 
k 

T = Max 5Z*{W};i 

i =1 

under the following constraints. 

Constraints: 

4 This was shown in [19] if / is monotone (which suffices for applications) and in [17] for general functions. 
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1. 


c s,i — vs,i 


o „./i _ 

x S,i — x S,i 

3. For all a, r G S& holds 
k 


E 




{cr(l),...,0-(i)}, ff(i) 


2=1 

4. If i G i? then 


E 

2=1 




X {r(l),...,r(«)}, t(i) 


T « < \ . ( t r v 

x S,i S A U Si 


We then have the following claim. 


Theorem 2.1: The noisy mutual information y[k], k + lj is upperbounded by the value 

of the optimization problem (Oi. 

Proof: 

First, consider the boundary data. We claim that for any permutation a G Sk holds 
k 

^ ] y{cr{ 1) cr(i) ~ If j k T 1^ (10) 

i= 1 

In fact, it is easy to see that the LHS is a telescopic sum, summing to 

Ent^f | [k + 1]^ — Ent(^f \ [k]^ — Ent(j \ {k + 1}^ = If(jk\,k + 1^ 

Next we define a feasible solution for © whose value is lT € ^f(jk ], fe + 1^. 

Fix ii C [/c]. Write f R for T tR f. For S C [fc] and z G S set 

x% = Ent(f R | S U {fc + 1}) - Ent(f R \ S \ {i} U {k + 1}) - Ent(f R | s) + Ent(f R \S\{i 

Clearly, x® i = ys,i and hence the first constraint of the program is satisfied. 

As above, for any permutation a G Sk holds 

k 

4(i).»(01, o-(i) = I T R f([k\,k + i) 

2=1 

Hence, the third constraint is satisfied as well. 
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In particular, 


Y, x {L,i},i = s k] f{W’ k+1 ) 

i =1 

so, the value given by this solution is indeed Ir e[k] f k + l^j. 

We continue to prove its feasibility. We claim that for any A C [k] holds Ent(^f R \ A ^ = Ent(^f RnA \ 

To see this, note that the noise operators commute with the conditional expectation operators, 
and hence 

v(T tR f\A) = T €R E[f\A ) = T eflnA T £iAA E(/| a) = T eRnA E (/ | a) = E (r eRnA f 

Hence, by definition, x R ri = x Rr ^ s for any R, S C [k], and the second constraint holds. 

To conclude the proof of the theorem, it remains to show that for any R C S C [k] and i € R 
holds 

x S,i < A '( x 3*) ( n ) 


Recall that the strong data processing inequality [[2] for a binary symmetric channel with 
crossover probability e states that if V is a random variable with values in {0,1}, and U is 
any random variable; and if Y = V © Z, where Z is a Bernoulli random variable with parame¬ 
ter e, statistically independent of U and V, then I(U\Y ) < A •/([/; V). 

Let X be distributed on {0, l} n according to ^ . Assuming, as we may, E / = 

E f R \( = 1, we can rewrite (ED as 


I ( X % © Z ; Xfc_|_i 


{v,} 


jeS\{i} 


< A -l(Xi-X k+1 



which follows from applying the strong data processing inequality with U = X^+i and V = X*, 
both conditioned on {Xj = for all values of Xj. 


3 The optimization problem and its symmetric version 

In this section we introduce a symmetric version of the optimization problem 0 and a specific 
symmetric feasible solution for the symmetric problem. We then argue that the value of this 
solution for the symmetric problem is at least as large as the optimal value for the original 
problem. Hence this value provides an upper bound on the noisy mutual information. 
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3.1 The symmetric problem and solution 

Let j be a feasible solution to the optimization problem @ with boundary data {ys,i}- 
We define numbers y\,... ,y^ as follows. For 1 < s < k let 


Vs 


E 

(S,i) 


ys,i 


( 12 ) 


where the expectation is taken over all pairs (S,i) such that |£| = s and i £ S. 
For 0<r<s<A;we define x r s recursively in the following manner: 


f Vs if 

\ A • + (1 — A) ■ otherwise 


(13) 


We now define the symmetric version of @, by replacing the boundary data by a new, sym¬ 
metric one. We set, for all i G S C [A;] with |5| = s: 


ys,i — ys 


Next, we define the symmetric solution for the symmetric problem, in the following way. For 
R C S with \R\ = r, we set 

-R = [ A-xr 1 if ieR 

s,i \ x r s otherwise 


and for general R, S we set 


%S.i 


=i?ns 

x S,i 


Proposition 3.1: The solution above is a feasible solution of the symmetric version of m- 
Moreover, for any R C [k] of cardinality r and for any t £ S*. holds 

k k—r r —1 

Y. X frm. TfiH.Tfil = Y y i + A 'Yj^k-r+t+l ( 14 ) 

i— 1 j= 1 t= 0 


Proof: 

The constraints 1 and 2 of ([9]) hold, by the dehnition of x R ^ We pass to constraint 4. Clearly, 
because of constraint 2, it suffices to prove it for R C S. In this case, taking i G R, we have, 
by the dehnition of x R t 



A • x 


r—1 

s 


A • x 


R\{i} 

S,i 
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Next, we note that (11411 will imply validity of constraint 3, since the RHS of (|14D does not 
depend on r. 

It remains to prove (fhO) . Let i\ < 12 < ... < i r be such that R = {r (h), r (* 2 ),r (z r )}- Then 




h -1 




i —1 


22 — 1 


E + E + ••• + E 

j =1 j=* 1 




*1 — 1 


J=1 


+ 



* 2-1 \ 
+ E *5) 

j=*i+i j 


+ 



+ •• •IA-<- 1 + £ 


ax 


J=*r+1 


Expanding x\ = \- x\ 1 + (1 — A) • x* _*j, we have the following exchange rule: 

Two adjacent summands of the form A • x *;■ + can always be replaced by + A • x*- +1 . 
Applying this appropriate number of times in each bracket transforms the expression above 
into 


*1—1 

Hvi 

3 =1 


+ 


f * 2-2 

E 

U=*l 


V] + A • z/i 2 -i + 



7 + A • *i 3 -i + 




k -1 

E 


„r—1 


+ A • x 


Next we observe that the following rules apply in the original ordering of the summands: To 
the right of x^ is always either a^ +1 or A ■ x'* +1 . To the right of A • x r s is always either x r a ^\ or 
A • r r+1 

A x s+i- 

Moreover, this is easily verified to be preserved by the exchange rule above, by checking the 
four arising cases. 

This means that applying the exchange rule as many times as needed, we can ensure all the 
summands multiplied by A to be on the last r places on the right. Since the first summand is 
always either y\ or A • y± , these invariants guarantee that by doing so we obtain (fTTl) . 
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3.2 Optimality of the symmetric solution 

Theorem 3.2: Let be a feasible solution to the linear optimization problem (0). Let 

| x 5*| be the symmetric solution for the symmetric version of this problem. 

Then, for any 0 < r < k holds: 


E 


{ 1 . "•,*}>* — 


E 

|*|=r 


E* R 


{ 1 ,-,*}>* 


i=1 
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Corollary 3.3: The optimal value of is upper bounded by the value of the symmetric 
solution to the symmetric version of the problem, which is given by 


k -1 


^ 

t =0 


Proof: Apply the theorem with r = k and use (11411 . | 

Proof: (Of the theorem). 

We proceed by double induction - on k and on 0 < r < k. For k = 1 the claim is easily seen to 
be true. 

Note also that the claim is true for any k and r = 0. This follows from constraints 1 and 3 of 
the linear program Q and the definition of the symmetric boundary data. In fact, we have 


k 

y{ b"j)j 


3 = 1 


k 

a ^ Sk y~lyWW,-Mj) I. °4i) 

3 = 1 


k 

ct 0 ') = 

3 = 1 k 


k 

V E 

“ \S\=3, i£S 


ys,i 


k 


*52 y 3 


k 


"22 

3 = 1 


Let now numbers r and k, with 0 < r < k be given. Assume the claim holds for A: — 1, and also 
for k, for all 0 < t < r — 1. We will argue it also holds for k and r. 

We start with some simple properties of the linear program @. We assume to be given 
the boundary data and a specific feasible solution to (f9|) , and the symmetric solution to the 
symmetric version of ([9]), as in Theorem 13.21 


Lemma 3.4: Let M C [fel. Let \ yx i r the restriction of the boundary data to subsets 

l ’ J ieKQM 

of M. For R C M, let . restriction of the feasible solution to subsets of M. 

Then < i > is a feasible solution to the appropriate (smaller) optimization problem on 

t ’ J ieKCM 

M. 

Proof: 

Constraints 1, 2, and 4 are easy to check. As for constraint 3, let a and r be two permutations 
from M to itself. Extend them in the same way to permutations a’ and r' on [k]. It is then 
easy to see that constraint 3 holds for a and r in the smaller problem, since it holds for a' and 
t 1 in the larger one. 

I 
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Lemma 3.5: Let M C [&], with \M\ = m and let R C [k]. Let r be a bijection from [m] to M. 
Let 


f(m,R,t) 


m 


a '{r(l),...,r(j)} J Rj) 

3 =1 


Then F^M, R, t^J 


depends only on m and \Rn M\. 


Proof: 

Since the symmetric solution satisfies constraint 2 of (|9|), we have 

m m 

f(m,R,t) = r(i) = Rj) = f(m,RHM,T 

j =i i=i 

Let r = |i? n M\. 

Proceeding exactly as in the proof of Proposition 13.11 we get that 

m—r r —1 

f(^M, R n M, r'j = + A-^a4_ r+t+1 

j =i *=0 

That is, f(^M, R, depends only on m and r = \R PI M|, as claimed. | 

Next, we introduce some notation. 


3.2.1 Notation 

1. Let M C[k\. Let |yA',*| 
M. 

We will denote by j 


M 


ieKCM 


r R 


be the restriction of the boundary data to the subsets of 


''KA 


| the symmetric solution to the symmetric version of the 
smaller problem with this boundary data. 

2. Let L C [&], with L = {ii, so that i\ < 12 < ... < ie- Let R C [A:]. Write 


y R (L) = 

3 =1 

For L C M C [k], and R C M, we denote 

i 

= y 

3=1 

Note that this quantity depends on M. With that, by Lemmas 13.41 and 13.51 given M, it 
depends only on the cardinalities \L\ and \R n L\. 


X {h, 


j}> 
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3. Using the observation in the preceding paragraph, given R C L C M C [k], with \L\ = £, 
and \R\ = r, we may also write S[/j] r M (/) for 

In particular, note that the proof of Lemma 13.51 gives, in this notation 


r —1 


= 5Z Vi + 

J=1 t=o 


(15) 


4. Finally, for Af C [ k ] and 0 < r < \M\, we write 


= |D| E /r"(M) and S[/^ = E «S[/r]^(M) 

|Ji|=r,flCM |R|=r,_RCM 


We have completed introducing the new notation. In this notation the claim of the theorem 
amounts to: 


k\k\ - ( 16 ) 

We start with a lemma connecting the value of a solution of the optimization problem to these 
of smaller problems. 

Lemma 3.6: 

k{k) < A ’ Mffe ] 1 + (! - A ) • . ® fc] ^[k]\{i} ( 17 ) 

Proof: 

Since the feasible solution satisfies constraints 2 and 3 of Q, for any i G R C [A:] holds 

V R ([k]) = ^ {i} ([fc] \ {*}) + xg ](i . 


Similarly, fj, 

R \b} = /jRUd 

(W \ {<}) 

, ./Aid 

+ x [k],i ■ 


Hence, by constraint 4, 




r R 

X [k],i 

=M 


- M H ' |il (W\{i})) 


Averaging, 





T 

M [fc] 

BC[fc]JB|=r ''W 

= E 

R, i£R 


VI 

H 

E 

R, i&R 


A • E 

R, i&R 


‘'(wxw)) 
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W!h ^ W (W) + 


It remains to note 


H,f es ^(W'W) 


E E u J 

ie[fe] |T|=r—1, TC[fc]\{i} 


'(w\w) 


E "MW 


iG[fe] 


and, similarly, E Ri j e _R ^AW ([fc] j = ^ 1 


We now prove (fT6l) . starting from (fI7|) . 


lr—1 


First, note that, by Lemma FT~H and by the induction hypothesis for k—1, we have g'L < <S[At]r fc u m, 
for all i € [A:]. 

Next, note that, by the induction hypothesis for k and r — 1, we have /ij^ 1 < 5[^]|’ fc j" 1 . 

This gives 


k\ k } < A ■ S\fi] r [k] + (1 - A) • , g E ] 5[/r][ fc]Ui} 


This implies that to prove (1161) it suffices to show the following two identities: 


2 . 


i!*] s H<wi} - i) 

5M[ H = + (l-A)-5Mf t f(*-l) 


Lemma 3.7: 


ie[k] 


‘ S [^]ffc] 1 ( fe — 1 


Proof: We introduce the following notation. For i = 1,..., k and for 0<r<s<k — 1, let 
Us,i — Us, [fc]\(d an d x s i — x s ^ [fc]\{j} 

The values on the RHS of these identities are defined as in (fl2l) and in (fT3l) for the corresponding 
restricted problems. 

We start with observing that E, ;e r fc i y S)i = y s . In fact, by definition, 

E y s i = E E Vs i = IE Vs i = Vs 

is[fc] ’ is[fe] |S|=s,SC[fc]\{i},jgS ’ |5|=sJ€S 


Next, we claim that for all 0 < r < s < k — 1 holds E ig [ fc ] x r s i = x r s . 
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This is easy to verify by induction on r. Note that we already know the claim holds for r = 0, 
and the induction step follows directly from the definitions and the induction hypothesis. 

We now apply (fl4l) to the restricted problems, to obtain that, for each 1 < i < k holds 




Hence, we have: 


k—r r —2 

^ Vj, i + ^ • X] x k-r+t.+ 1, i 

j =1 t =0 


k—r 


r—2 


1 —2 


E = V E yj i + A-V E 


x 


ie[fe] 


j=i 


t =0 


is[fe] 


fc—r+t+1, i 


k—r 

yj + A ■ 2 

j=i 


4 


t=0 


This, by (fT5l) . equals to 5[/r]| fc j (fe — 1), completing the proof of the lemma. 


Lemma 3.8: 


SMS = + (i-A)-SMi5‘(*-i) 

Proof: 

The proof of this lemma is similar to that of Lemma 13.61 


Since the symmetric solution 5r fc i 
and 3 of ©, for any i G R C [k] holds 


„R 

v S,i 


which is the same as 


H*}) 


satisfies constraints 2 


SMg(w) = SM^ {,, (w\{i}) + s w 


t R 

X [k],i 


Consider the notation we have introduced above. Using items 3 and 4 in the description of this 
notation, and recalling <Sr fc i 


r R 

c [ku 


= A • x r k 1 , we can rewrite this equality as 


sW [t] = + v *;- 1 

On the other hand, we have, for i S f? C [A:]: 

s[<} {i> (w) = (l*l \«) + 


>[*] 


R\{i} 


which is the same as 

SMS 1 = 5 K»i 1 ( fc_1 ) + 


r —1 


Combining these two identities immediately implies the claim of the lemma. 


This completes the proof of (11611 and of the theorem. 


—r+t+l 
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3.3 The value of the symmetric optimization problem 

Let be the symmetric solution for the symmetric version of ([9]). By Corollary 13.31 its 

value depends linearly on the symmetric boundary data yi,...,y k , since {x[} are fixed linear 
functions of yi,--.,yk- Let us denote this value by V (y\, ...,y k ). 

For 1 < s < k, let e s be the initial data vector with y s = 1 and all the remaining yt vanishing. 
Then V (y 1} ..., y k ) = YX s =i Vs ■ V (e s ). 

Next, we find the values of the parameters x r t for initial data given by a unit vector. 

Lemma 3.9: Let the initial data be given by the unit vector e s , for some 1 < s < k. Then the 
values of the parameters x\, for 0 < r < t < k, are as follows. 

r _ ( it- s ) ' • (1 — A Y~ s if s < t < s + r 

1 \ 0 otherwise 

(We use the convention (°) = 1.) 

Proof: The claim of the lemma is easily verifiable by induction on r, or by directly verifying 
that (fT3l) holds. | 


Corollary 3.10: 


k—s 


V(e s ) = As.^r+^-Ya-A y 


m =0 


m 


s —1 


1 - £L 

3=0 V/ 


Proof: The first equality follows from Corollary 13.31 For the second equality, we proceed as 
follows 


A s 8 s - 1 


V(e s ) = 


(s — 1)! dx s_1 


(1 + X + • • • + X' 


k -1 


x=l —A 


/ QS -1 

1 


QS-l 


l dx s ~ l 

H 

1 

T— 1 

x=l —A 


1 — X 


- x=l—X, 


1 - 

A 

S 

QS-l 

x k 




(»- 

!)■ 

<9x s_1 

1 — x 

x=l—X 



We have 








d t 

x k 


-WI 

II 

t\ d l 

i 

d l 1 

x k 

dx t 

1 — X 


i) dx l 

1 — X 

dx t_l 
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kl k-t+i 1 

w (k-t + i)\ '(l-®)*- 1 

Substituting j = t — i and rearranging, this is 

Substituting t = s — 1, x = 1 — A, and simplifying, we get 

V(e.) = 1 - £(*)V(1-A)^ 

3=0 W 

I 



Corollary 3.11: 

V (yi,---,yk) 



4 Proof of Theorem 11.71 

We start with introducing some more notation. 


4.0.1 Notation 

• For a subset S of [n] of cardinality at most n — 2, and for distinct i, j 0 S, we set 

Z S -i,j = Ent^f | S U - Ent(j \ 5U {z}^ - Ent(j \ S U {j }) + Ent 

• For s = 1,n — 1, let t s = Z s . tJ . 

Here the expectation is taken over all subsets S of [n] of cardinality s — 1, and, given 5, 
over all distinct i. j not in S. 

• Let A be a subset of [n] of cardinality k < n and let For 1 < s < k, let 

Y(A,m,s ) = E Z s . i)Tn 

where the expectation goes over subsets S C A of cardinality s — 1, and over i G A\ S. 
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• For 1 < s < k < n let 
A(k,s,X) = 



X j (l - X) k ~ j 


Proposition 4.1: Let f be a nonnegative function on {0,l} n . Let A be a subset of [n] of 
cardinality k < n and let m 0 A. 

Then 

k 

(A,m) < E A(k,s,X) ■ Y(A,m,s ) 

s=i 


Proof: 

By Theorem 12.11 the value of Ir eA f ( A,m ) is bounded by the value of the linear optimization 
problem (jSJ), with appropriate changes of indices. 

By Theorem 13.21 this last value is upperbounded by the value of the symmetric version of the 
problem, which, according to Corollary 13.Ill and tracing out the appropriate changes in indices 
and notation, is given by A(k, s,X) Y{A, m, s ). 

I 

Proof: (Of the theorem) 

The proof relies on several lemmas. We start with a technical claim. 


Lemma 4.2: Let 1 < s < n — 1 be integer parameters. Let 0 < A < 1. Then 


n —1 

A(k,s, A) 

k=s 


Proof: 


71—1 

Y,A{k,s,X) 

k=s 




* i'l ±(>-« 

g (>-»-) - 





E 


j =0 



(1 - X) k ~> 


A simple calculation, similar to that in the proof of Corollary 13.101 gives 


k=s ^ ' 




X\\-X) 
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The proof of the lemma is completed by summing the RHS over j, and observing 


j =o t =o ' ' 


Lemma 4.3: Let f be a nonnegative function on {0, l} n with expectation 1. Then 

n / x 7i—l 

Ent(r e f) < '^24>(Ent(f\{i} S )j + Y w s' t s 

i =1 ' ' s=l 

where 

», = (An-») + £ E(' ! )a‘(1-A)"-‘ 

j= 0 t=0 ' ' 


Lemma 4.4: Let f be a nonnegative function on {0, l} n . For any 0 < u < n — 1 holds 


U 

E Ent (^f | B^j — (u + 1) • E Ent (^f \ {?'}^ = — s + 1 j ■ t s 


|£|=u+l 


s =1 


Next, we derive the theorem, assuming Lemmas 14.31 and 14.41 to hold. 

Let T be a random subset of [n] generated by sampling each element i G [n] independently with 
probability A. We will show 


E 

T 


Ent 


(. f\T ) - J>rat(/|{i}) 


ieT 


n— 1 

y ^w s -ts 

s =1 


Combining this with the claim of Lemma 14.31 will complete the proof. 
For 0 < k < n, let pk = (k)A fc (l — \) n ~ k . And, for 0 < u < n — 1, let 


h’U - 


E 

\B\=u+l 


[Ent(^f 


b) - Y, Ent {f i m) 


iG-B 


Then, using Lemma 14.41 and observing that /iq = 0, 


E 

T 


Ent 


(. f\T)-YEnt[f\{i }) 


ieT 


Tk-i = 

k =2 
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n k —1 n—1 / n \ 

X Pfc ^2( k - s) -ts = Xl X ( k - S )Pkj-ts 

k=2 s=l s=l \fc=s+l / 

We conclude by verifying the identity w s = ^fc= s +i ( k ~ s )Pk, for s = 1,n — 1. 
In fact, 


j=o t =o 


k =0 


t =0 


fc=s+l 


I 

It remains to prove the lemmas. 

Proof: IQf Lemma 14.31) 

Recall that, by the chain rule for noisy entropy (J7|), for any permutation a € S n holds that 
Ent ( T e f ) is bounded from above by 


n 

<t> 

i =1 

Using the notation introduced in Subsection 11.4.11 we can write this as 

|>(unf(/ | {cr(i)}) + I Tt{alih Mi _ i)yf ({u(l),...,a(z-l)}, u(i)^ 

Observe that the function cf> is concave, and ^>(0) = 0. Hence cj)(x + y) < (j>{x) + 4>(y) for any 
0 < x,y < 1. By this subbaditivity of 0, the last expression is at most 



Ent 


( T e M l).I Ml),. ■■,*(*)}) - Ent ( T e { „ W ,...^i- 1)y f I {a(l),...,C7(z-l)} 


U / \ 

J2<t>[Ent(f | {z}) J 
i=i V ' 


+ 


n 


I> 



{<r(l),...,CT(fc-l)> 


/ 




Averaging this expression over all a £ S n , we obtain 


Ent 



< X^f^t/ | {*})') 

i=l ' ' 


+ Pi 


where 


P 


= E 


a 
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Next, we upper bound /i. By transitivity of action of the symmetric group and by concavity of 
(j) we have 


71—1 

M wllere h = ^ 7^/ (A™) 

k =1 ,m 


where the expectation is over all A C [n] of cardinality k and m ^ A. 
Applying Proposition 14.11 we get 


h < 


E 

A,m 


K K 

y^A^/j,s,A^ -y(A,m,s) = • E Y^A, m, 


S=1 


S=1 


By the definition of Y ( A, m, s 


E Y( A,m,s 
A,m V . 


E 

A,m 


E Zs-,i,m 

0,1 


IE Z s -i : m - IE 1 = E Zs-i } m 

0,1,771 A 0,1,771 


where in the second expression the first expectation is over ^'-subsets A of [n] and and 

the second expectation is over (s — l)-subsets 5 of A and over i £ A \ S. Rearranging, we 
get the third expression in which the first expectation is over all subsets S of [n] of cardinality 
s — 1 and over all distinct i, m 0 S, and the second expectation is over all supersets A of S of 
cardinality k with i £ A and m £ A. 

Recalling the definition of t s above, we deduce b & = X^s=i s, A^ • t s . 

Using the inequality (j>(x) < Ax, and Lemma 14.21 we have 


77—1 

n < A • Y2 bk 

k=\ 


71—1 k 

A • ^ ^a(A;,s,a) • t s = 

k= 1 5—1 


71—1 


E‘«- 


71—1 

A •£>(*, 



71—1 

y ^w s -t s 

s =1 


I 

Proof: (of Lemma 14.41) 

By (HOD , for any subset A of [n] of cardinality 1 < k < n — 1, for any m £ A, and for any 
bijection r : [k] —> A holds, in the notation of this section, 

k 

y! l)};r(s),m = If (A 771 ) 

s=l 
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We now average over all the variables, setting 


°k E ^ -^{r(l),...,r(s-l)};T(s),m 

A±,m,r 

s =1 


On one hand, we have 


Cfc = E If (a, m) 

A.m \ / 


= E 

A,m 


Ent(^f | 4 U {m}^ - Ent(^f \ A^j - Ent(^f \ {m}) 


E Ent(f | B) - E Ent(f | A ) - E Ent(f | «) 


|S|=fe+l 


On the other hand, similarly to the computation in the preceding lemma, we have 
k k k 


c fc = Y' E Z. 

‘ ■* 4 7T7 T 


{r(l),...,r(s-l)};r(s),m 


\ E .Zg-j m = 'y t s 


s=l 


S=1 


S=1 


where the expectation in the third expression is over fc-subsets A of [n], over (s — l)-subsets S 
of A, over m ^ A and i € A \ S. 

Hence, for any 1 < u < n — 1 holds 

U U 

E Entif I b) - (ii + 1)- E Ent(f\{i}) 

\B\=u+l \ ) ie[n] V V 


y^cfc = (u - s+1) • t. 


k =1 


s=l 


completing the proof of the lemma and of the theorem. 


5 Proof of Theorem 11.14 

Let 5 be the constant in the theorem. We will assume in the following argument that 5 is 
sufficiently small. 

Let 0 < e < 1/2 be a noise parameter, such that (1 — 2e) 2 < 6. Let A = (1 — 2e) 2 . 

Let / : {0, l} n —> {0,1} be a boolean function, satisfying the constraints of the theorem. Let 
1 < k < n be the coordinate such that \f{k)\ is large. W.l.o.g. assume that k = 1 and that 
/(1) is positive. 

We introduce some additional notation. 
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Notation 


• Let 0 < a < 5 be such that /(1) = (1 — a) ■ Ef. 

• Let 0 < f3 < 5 be such that E / = 1/2 — (3. Let 7 = a + (3. 

• If a < A, we define r = ^ , and define auxiliary noise e r , such that ^1 — 2e T ^ = r. 

If a > A, we set r = 1 and e T = 0. 

• Let e\ be such that T e = T ei T er . Let Ai = (1 — 2ei) 2 . Note that A = r • Ai. 

• Let h = T er f. Note that T e f = T tl h , and hence Ent^T^f^j = Ent(r ei h^j. 


5.1 Proof of the first claim of the theorem 

We start with applying Theorem 11.71 to the function h with noise e±. The theorem is stated for 
functions with expectation 1. We modify it, using the linearity of entropy, to obtain 


Ent 


T ei h 


< 


E 

T 


^Ent(h | T) -^2Ent(h | {i})^ + ^\ Ent (wh ’ 61 


2—1 


Here T is a random subset of [n] generated by sampling each element i G [n] independently 
with probability (1 — 2 ei) 2 . 

Since there are more than one noise parameters involved, we now write the function cj) with the 
noise parameter stated explicitly. 

Next, note that by (HI), for any 1 < i < n holds 


E h ■ (j)\ Ent 


Eh 


{i}^j , ei^ < Ai • Ent(h | {i}) 


Hence the previous inequality implies 


Ent (T f1 h ) < Ai • E [ Ent ( h \ T ) — Ent ( h 
T,ieT 


{!}) 


+ 


(1 — Ai) • Ent^h | + E h ■ (j)^Ent 


h 

Eh 


{1} . d 


( 18 ) 


The proof will be based on three lemmas, which upperbound each of the three summands on 
the RHS of (USD. 
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Lemma 5.1: 


t,Fst {^ nt ( h 1 T ) ~ Ent { h I W)) - O^Ai -7 + 7 2 In 


Lemma 5.2: 


E Ent I h | T 

T,1£T 


(h | 7 1 ) < Of A? -7 + Ai-7 2 ln^ 


Lemma 5.3: 


E h ■ (j) [ Ent 


Eh 


{l}),eij < -■ 


The asymptotic notation in each of the lemmas hides absolute constants. 

Given the lemmas, the first claim of the theorem is easy to verify. Indeed, recall that Ai is a 
constant multiple of A. Hence, the lemmas and (11811 imply that 

Ent(r e f'j = Ent(r €1 h ) < ^ • (l - H(e)j - + o Aj7 ->.o (a • 7 ) 

Therefore, for a sufficiently small 5 > 0, bearing in mind that 0 < a, f3, A < 5, the claim holds. 

It remains to prove the lemmas. For that purpose we will need the following version of the 
logarithmic Sobolev inequality for the boolean cube. 

Lemma 5.4: Let g be a nonnegative function on {0, l} n . Let £(g,g) be the Dirichlet form, 
given by £(g,g) = E xe { 0)1 }» E y ~ x (g(y) - g(x)^j . Then 

£{g,g) > 21n2-E g-Ent(g} 

Proof: 

We start with a simple auxiliary claim. 

X 2 

Let x\ > X 2 > ... > xn be nonnegative numbers summing to 1. Then the numbers y & = N k —j, 

Z-a=l X i 

for k = 1, —,N, majorize {x^,.}. That is, 

y\ > xi, y\ + V 2 > xi + x 2 , ... ,yi +... + vn = 1 = xi + .-. + xjv 

To see this, fix some 1 < t < N. We have to show x \ — (Sl=i x k) • (Ylk =l ■ 
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We may and will assume that all of the Xk are strictly positive. After some rearrangement, the 
claim reduces to showing 


Eti 4 

ElLi x * 


E N 2 

m=t +1 X m 

2_*,k=t +1 


This holds because the LHS is lowerbounded by x t , and the RHS is upperbounded by x t +i- 

A simple corollary of this claim is that for any nonnegative not identically zero function g on a 
finite domain endowed with uniform measure, holds that g 2 /¥*g 2 majorizes g/Kg. 

This is well-known to imply (see uni) that g/ E g is a convex combination of permuted versions 
of g 2 /JLg 2 . Since the entropy functional is linear and convex, this implies 

Ent{g 2 ^ > ■ Ent (.g) > E g ■ Ent(g^j 

The claim of the lemma follows from this inequality combined with the logarithmic Sobolev 
inequality [8j: 


£(g,g ) > 2In2 ■ Ent(g 2 ^j 


I 

In the following argument we are going to use the Walsh-Fourier expansion for functions on the 
boolean cube, writing a function g as J^scfnl d (^)' Hs, where < Ws r is the Walsh-Fourier 

v SC [?7/] 

basis. 

In particular, for the Dirichlet form, we have £{g,g) = 4- ^sc[n] |'S , |< 7 2 (S'). Hence the preceding 
lemma implies 

M 9 ) s < 19 > 

s c[n] 

We will also need the following precise version of an inequality of [4], due to [6j: 

Theorem 5.5 : There exists a universal constant L > 0 with the following property. For 

g : {0, l} n — > {—1,1}, let p = AC[n\-\A\>29 2 (^)) • Ehen there exists some B C [n] with 

\B\ < 1 such that 

g 2 (A) < L-p 4 In 

AC[n]:\A\<l,A^B 

and\g(B)\ 2 >l - p 2 - L-p 4 ln(^j. 
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Consider the function / and recall that it satisfies the assumptions of Theorem 11.141 

Let g = 2/ — 1. Then g : {0, \} n —t {—1,1}- Note that g(0) = 2/(0) — 1, and that g(S) = 2/(5), 
for \S\ > 0. 

In particular, g(0) = 2E / — 1 = —2/3, and g({l}) = 2(1 — a) E / = (1 — ce)(l — 2/3). 

Recall that 0 < a, j3 < <5, and that 7 = a + j3. Hence, assuming 5 is sufficiently small, we have 

E /V) ^ ^ 1 - ? 2 ({1}) < L ■ 7 , (20) 

| A |>2 | A |>2 

for some absolute constant L. 

Applying Theorem 15.51 to the function g, we get, for a sufficiently large constant 

E^(w) ^ E? 2 ({ fc }) - Li- ^ 2ln (-) ( 21 ) 

fc=2 k=2 \l / 


Proof of Lemma 15.21 

Fix T C [n]. Let gx = E ^h | . 

Note that gx = YIsct h(S) • Wg, and hence, by (fT9l) . we have 


Entigx 


< 


2 

ffi2 


1 

Efi'T 


• £ l»l '• 2 (S) 

SCT 


2 

ffi2 


1 

Eh 


• E i 5 i ^ 2 ( 5 ) 

SCT 


Hence, 


E 

T,1£T 


Ent (gx 


< 


2 1 
In 2 Eh 


E 

T,1£T 


E i 5 i 

SCT 


2 

ffi2 


1 

Eh 


• E i ^ 1 

S,1#S 


Recall that h = T er f. This means (see, e.g., (llj) that for any 5 C [n], holds h(5) = t^ s ^ 2 ■ f(S). 

In particular, |h(5)| < |/(5)|. Applying (1201) and (l 2 T|) . we have that, for a sufficiently large 
absolute constant L, the last expression is bounded by 

L • f A? • 7 + Ai • 7 2 In ^ 

This concludes the proof of the lemma. | 
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Proof of Lemma 15.31 


Let g = E (^j 

g{ 1 ) = ol. 


Then g is a function on a 2-point space {0,1}, with g(0) = 2 — a and 


Observe that the noise operator commutes with the projection operator. Hence, since h = T er f, 
we have g 2 := E {1}) = T tr g. 

Observe also that, by the definition of Mrs. Gerber’s function 0, we have 


Ent 


_h_ 

Eh 


{1} , ei = Ent[T ei gi) = Ent(T ei T eT g) = Ent[T e g 


The last equality follows from the definition of e\ and e r . 

It is easy to verify that T e g( 0) = 1 + (1 — a) ■ A 1 / 2 and that T e g( 1) = 1 — (1 — a) • A 1 / 2 . 
Hence, Ent(r e g ^ = 1 — H 2 ^ ''' ^. 

Recall that 


H 2 


1 — x 


= 1 - 


1 

M2 


£ 

k =1 


1 


J2k 


2k(2k - 1) 


with the series converging absolutely for — 1 < x < 1. 

Let F{x) = l-H 2 , for 0 < x < 1. Then F(x) = ^ • E£Li 2 fc(2fc-i) '• 

This is a convex function on [0,1], and hence for any 0 < x < y < 1 holds F(y) — F(x) > 
(y — x) ■ F'(x ). The derivative F' is given by F'(x) = ' EfcL 1 ' xk ~ l i with the series 

converging for 0 < x < 1. 

Hence F' > on (0,1), and F(y ) — F(x) > ^72 ' (v ~ x )- Applying this with y = A and 
x = (1 — a) 2 • A, we get 

(l — JT 2 (e)) - Ent(Teg) = F{\) - t((1 - a) 2 • a) > a ■ A • a 

where c\ > 0 is an absolute constant. 

In other words, 



To conclude the proof of the lemma, note that, for a sufficiently small A, we have Ent\T e gj > 
C 2 ■ A, for an absolute constant C 2 , and hence 


E h ■ (f> [ Ent 


Eh 


{ 1 } = 


- p\ ' Ent(T e gj < 
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H 2 (e)^ - c-A-(a + /3) = ^ ■ (l - H 2 (e)^j - c- A -7 

for an absolute constant c. For the inequality, note that 1 — = F( A) > 

This completes the proof of the lemma. | 

The proof of Lemma 15. II is somewhat harder. We present it in the next subsection. 


5.1.1 Proof of Lemma 15.11 

We proceed similarly to the proof of Lemma 15.21 and use the notation introduced in that proof. 
Given a function g on the boolean cube, we write E (^g \ x\ = 0,x 2 , ...,x^ for the restriction 
of E (^g | x±,X 2 , ..., x/^j on the subcube x\ = 0, and similarly for E [g \ x\ = 1, x 2 , ..., x . 

We note that for g = X^sc[n] ' W 5 , we have 


and 


E [g | xi = 0 , x 2 , 


E (g | xi = \,x 2 , -.,Xnj 


E [a(R) + g(Ru{i}))-W R 

RQ [n],l^ij 


(g(R) - g(RU{l}))-W R 

RQ[n],l£R 


We will also use the following easily verifiable identity, holding for nonnegative functions g: 

Ent (g'j - Ent(^g | {1} j ^ • Ent(g \ x x = 0, x 2 , ..., x n ^ + i • Ent[g \ aq = 1, x^ 

As before, let gx = E (h \ T^, for a subset T C [n]. Note that if 1 € T, then E (^gr | {1}^ = 

E (h I {!})• 

Hence 

rfeT ( Ent { h I T ) ~ Ent ( h I {!})) = Tj F gT (^(st) -Ent(g T | j 1 })) = 

\ ■ E Ent(g T \ x\ = 0,x 2 , —,x n ') + ]- • E Ent(g T \ aq = 1, x 2 ,..., x n ) 

Z 1 1 ,lt J \ / Z jl jicl \ / 

We will prove the lemma by showing that, for a sufficiently large absolute constant L, hold 
both 


and 


E Entlgx I x\ = 0 , x 2 , ..., x n ) < L • Ai • 7 

r,i eT V / 

| = 1,^2, < L • ^Ai • 7 + 7 2 ln 


( 22 ) 


(23) 
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Proof of (I22|) 

Fix a subset T C [n], with 1 G T. Recall that gx = Ssct and hence 

E (g T | xi = 0, X 2 ,= X] (M-R) + M^U{l})j -Wr 

SCT\{1} 

In particular, 

E( 5T |xi = 0) = h(0) + h({l}) = /(0) + r 1 / 2 •/({l}) > E/ 

Applying (1191) . we have, for a sufficiently large constant L\, 

Ent(g T \ xi = 0,x 2 ,...,x n ') < ^ ' |f7 ‘ X] 1-^1 ‘ (M-R) + h(R U {1})) < 

Li- E |R|- (/?(R) + h 2 (RU{l})) 

RCT\{1) 

Averaging over T, we have 

E Ent[gx \ x± = 0,x 2 , ■ -■,x n ) < L x ■ ( E |R|Af'/^(R) + E |i2|Ap^ 2 (i2 U {1}) 
T,leT 

Using the fact that |Ii(S)| < |/(S)| for all S C [n], and applying (1201) and (12IT) . we have, for a 
sufficiently large constant L 2 , 

E l^l^'h 2 ^) < U 2 - (Ai- 7 2 ln(i) +A 2 - 7 ) 

fl,ig.R A w/ / 

and 

E l^|Ai H| h 2 (RU{l}) < L 2 -Ai- 7 
Summing up, this gives (|22l) . 
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Proof of (I23|) 

Similarly to the above, 

E [gr | x\ = 1,X2, ...,x n ^ = E (h(R) ~ h(R U {1})) • Wr 

flCT\{l) 


Which means that 

e(<? t |xi = i) = h(0) -h({ 1}) = 7(0) - t 1 ^./({!}) = E/-(l - r 1 / 2 ■ (1 

Recall that r 1 / 2 = 1 if a > A and r 1 / 2 = 7^ otherwise. In both cases, note that we have 
E (st I zi = l) > A - E/. 

Applying (1191) . and averaging over T, we have, for a sufficiently large constant Li, 

tUt Ent(gr \ x\ = l,x 2 ,...,x n ^ < L i'j' E |R|A^' • (h(R) - h(R U{l})j 

’ R,1£R 

Let g = E (h \ x\ = 1, x 2 , ..., . Then g = Y.RC[n],i<£R (h(R) ~ h(R U {1})) • W R . Hence 


cUt Ent ( 9T I Xl = l-!* 2 ,-)^) < Li ■ j ■ E \ R \ X ^-9 2 (R) 


( 24 ) 


R,1£R 


Consider the function g. Since h = T fir /, we have 


g = e r • T £r ^E ^/| xi = 0,x 2 , ...,x n ^ + (l - e r ) • T €t ^E (/ | x\ = 1,x 2 , 

For % = 0,1, let /,; = E | xi = i, x 2 , ..., x n ), and let L = Note that for i = 0,1 and for 

any R, 1 <£ R, holds |7(R)| < |/j(R)|- 

Therefore, since g = e T ■ to + (l — e T ^ • t\. we have, for any R, 1 R that 


g 2 (R ) < £r • to (R) + 1 - er • U (R) < e T -fo (R) + 1 - e T -/i (R) 


Hence, 


E < er- E \R\A Rl fo 2 (R) + (l-er)- E l^lA^'/i 2 ^) ( 25 ) 

R,1£R 


— a 
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Exactly as above, we have the following upper bound for the first summand: For a sufficiently 
large constant L 2 holds 

£ \R\x[ Rl fo 2 (R) = £ \R\\[ Rl (f(R) + f(RU{l})Y < La-Ai-7 

R,1&R R,1&R 

Consider the second summand. The function is a boolean function, whose expectation equals 
/( 0) - /({!}) =a-Ef <a. Similarly, E ff = E/, < a. 

We now apply the inequality of [20], which states that 

For a boolean function g : {0, l} m —>• {0,1} with expectation g < 1/2 holds Yl™=i ? 2 ({^’}) E 
L 3 • g 2 ■ ln(l /g), for a sufficiently large absolute constant L 3 . 

In our case, this implies ££ =2 /i 2 ({fc}) < L 3 ■ a 2 • In (±), for a sufficiently large constant L 3 . 

This means that, for a sufficiently large constant L 4 , we can upperbound the second summand 
in (125|) by 

£ [RlxPffiR) < L a - ("ai- a 2 ln f-') + A f-a 

R,l<j£R ^ 


Recall that for a < A, we have e r = 1 / / = 1 °0 < L 5 • A, for an absolute constant 

L 5 ; and that for a > A, we have e r = 0. Plugging these estimates into (l25l) . we have 

£ \R\X[ RI -g\R) < T 2 ■ T5 ■ A ■ Ai ■ 7 + ■ a 2 \n f + A? • a 


And hence, coming back to (I24|) . and recalling that A = r • Ai, we have, for sufficiently large 
absolute constants L, L', that 


E Ent( gx I x\ 

T,ieT V 



< 


L’ ■ 


^Ai -7 + ct 2 In 



+ 



< 


L ■ ( Ai ■ 7 + 7 2 In ( — 


This completes the proof of (1231) . of Lemma 15.11 and of the first claim of the theorem. 

I 
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5.2 Proof of the second claim of the theorem 


First, note that if / is balanced, that is E f = h, then so is 1 — /, and the second claim of the 
theorem follows immediately from the first claim. 

If E/ ^ f, some additional work is required. We only sketch the argument below, since it is 
very similar to the proof of the first claim. 

Applying Theorem 11.71 to the function 1 — / gives (cf. (fT 8 l) l 


Ent(T e { 1 - /)) = Ent(r ei {\ - h)) < Ai • ^ E ^ (^Ent({ 1 - h) | Tj - Ent({ 1 - h) | {1})^ 

+ E(1 - ft) ■ ^Ent j {!}) 


+ 


{ 1 } , ( 26 ) 


As in the proof of the first claim, we upperbound each of the three summands on the RHS of 
AMD separately. 

Repeating the argument, with the necessary (minor) differences, leads to the same first two 
bounds: 


,E t (ftmtfil -ft) | r) - En f((l - ft) I {1})) 

E t Ent((l -h) it) < of A? • 7 + A, . 7 2 In ()) J 


< O Ai • 7 + 7 In - 


7 


Indeed, this should not be surprising since, roughly speaking, these two bounds for h are 
obtained by analysing the behavior of (the squares of) its non-trivial Fourier coefficients, and 
this is the same for h and for 1 — h. 


As to the third summand, we will follow the argument in the proof of Lemma 15.31 


Let g E 



g( 1 ) =2 — p, where p = 1 + 


This is a function on a 2-point space {0,1}, with g(0) 

( 1 - 00 ( 1 - 2 / 3 ) 

1+2/3 


p and 


Note that p < 2 — c • 7 , for some absolute constant c > 0. Hence, proceeding as in the proof of 
Lemma 5.3, gives 


(j)\ Ent 


1 -h 
E(1 - h) 


{ 1 } ) , £ i ) = Ent[T e g) < (1 — H 2 (e)) - d ■ A • 7 , 


for an absolute constant d > 0 . 

Next, recall that in the proof of Lemma [5.31 we show <f> Ent 
ci • A • a for an absolute constant ci > 0. 




< (l 
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Hence 


Eh-(j)\Ent( {l}^,eij + E(1 - h) ■ A Ent 


{1} , ei < 


(l - H 2 (e)) - c 2 • A - 7 , 


for an absolute constant c 2 > 0 . 

We can now complete the proof of the second claim of the theorem. 
Combining all bounds on the right hand sides of (fl8l) and of (1261) above gives 


Ent(T e f) + Ent(r e (l - /)) < (l - H 2 (e)) - h(a- 7 ) + o a , 7 ^ 0 (a - 7 

Since A, 7 < 0(6), this implies that for a sufficiently small 6 > 0 holds 
Ent(T e f') + £7nt(T e (l-/)) < 1 - H 2 (e) 


6 Remaining proofs 

6.1 Proof of Lemma 11.21 

We have, for a boolean function /: 

j(/(X);y) = #(/(*)) - H(/(X)|y) = #(/(*)) - Eh(/(X)|F = y) = 

H 2 (e/) - M 2 ((TJ)(j/)) 

We have H 2 (E /) = E / log ^ + (1 — E /) log 
We also have (all the logarithms are binary) 

EH 2 ((T, my) ) = E((T./) W log^ + (l-(T«/)(| / )) 1 „ gT -W TM ) = 

-(Ent(T e f ) + ETJTogET,/) - ^nt(r e (l -/)) + ET f (l — /) logET e (l — f)^j 

-(^Ent(Tj) + Ent(T e (l - f))^j + E/log^ + (l-E/)log— 

In the last step we have used the fact E T e g = E g for any function g. The claim of the lemma 
follows. | 
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6.2 Proof of Corollary 11.101 

Applying Corollary 11.91 to the functions / and 1 — /, we obtain, by Lemma 11.21 


■,Y) = 


Entfrj^J + Ent(T e ( 1 - /)) < E (/ j t) + Ent({ 1 — /) | T 


To conclude the proof of the corollary, it suffices to show that for any T C [n] holds 
Ent[f | T) + £nt((l - /) ( T) = /(/(A); (A^t) 

To see this, we proceed exactly as in the proof of Lemma 11.21 observing that, by definition, 
Pr{f( A) = 1 [ {Xi} ieT } = E (/ | T) 


Here we interpret both sides as functions of {x*}, * £ T. 


6.3 Proof of Theorem 11.121 

The proof of this theorem is very similar to that of Theorem 11.71 and uses the notation and 
some of the results from that proof. 

As in the proof of Lemma 14.31 our starting point is the chain rule for noisy entropy (|7|). which 
states that for any permutation a £ S n the noisy entropy Ent(r e f^J is bounded from above by 

{<t(z)}) + I Te{am ... ]CT(i _ 1)} / ({cr(l),... ,cx(z - 1)}, 

Averaging over a £ S n and using transitivity of action of the symmetric group and concavity 
of cj), this is at most 


n —1 / \ 

y: 4 1 Ent (/ I z) + b k j where b k = T eAf (a, rn) 

and the expectation is over all A C [n] of cardinality k and m 0 A. (In particular, we set 
&o = 0). Using the concavity of (j> again, this is at most 

■ (j) ( E Ent(f \ i) + -•y'&fc) 

V eW V ’ n fro / 


n 
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The analysis in the proof of Theorem 11.71 shows that if T is a random subset of [n\ generated 
by sampling each element i G [n\ independently with probability A then 


n— 1 


n— 1 


ATfc = j-^2 w s-t s = j-E(Ent[f\T) - ^2Ent(j\i 


k=0 


s=l 


i&T 


— ■ E Ent (f \t) - n- E Ent( f I * 
At V / iefni V 


Substituting and simplifying, we get, setting t = An, 


Ent[T e f ) < n • cj> 


'E T Ent(f | T) 


which is the claim of the theorem. | 


6.3.1 Proof of © 


Let / be the distribution of X multiplied by 2 n . Then E/ = 1, and Theorem 11.121 can be 
applied. 

By Section n.l.ll and ©, we have Ent(^T 6 f^j = n — h(^X © Z^j and Ent(^f \ T^j = |T| — 

We also recall <t>(x, = 1 — H 2 + (1 — 2e) • IT/^l — x)J. 

Substituting in the claim of the theorem, and simplifying, gives 


h[x®Z) > n-H 2 e + (1 - 2e) ■ H, 


i /E tH^X^t 


which is the claim of 


6.4 Proof of Theorem 11.151 

Let 5 be the constant in the theorem. We will assume that 5 is sufficiently small. 

Let e be a noise parameter, such that (1 — 2e) 2 < 5. Denote A = (1 — 2e) 2 . 

It is known (see [7]) that for any boolean function / holds I ( f(X);Y ) < A • H 2 (E /). This 

immediately implies the validity of Conjecture 1 1.1 1 for boolean functions with expectation lying 
in [0,c] U [1 — c, 1], for some absolute constant 0 < c < 1/2. 
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In addition, we may assume, by symmetry, that E / < 1/2. Combining these two observations, 
it remains to consider the case 


c < E/ < 1/2 


(27) 


Let / be a boolean function satisfying (1271) with I ( f(X ); Y") > 1 — Hiie). This is the same as 
Ent ( TJ) + Ent (T e ( 1 - /)) > 1 - tf 2 (e). 

At this point, we need a technical lemma. 


Lemma 6.1: For any nonnegative non-zero function f /ioZd.|l 


Ent 


(T./) 


< 


1 


1 


E / l 2 In 2 


■pPm)) - A + Oa^o(|j-A 4 / 3 ) + O A ^ 0 ( 


E 2 / 2 \ 2 

E 3 / 


We will now proceed with the proof of the theorem, and prove the lemma below. 
Applying Lemma 16.II to functions / and 1 — / and taking into account (1271) gives 


Entfrj ) + £nt(r £ (l-/)) 


< 


1 

E/(l —E/) 


p_ 

l 2 In 2 


n \ 

' A + °a^o(a 4/3 ) 


Combining the two inequalities for Ent(T e f) + Ent(T e ( 1 — /)), recalling 1 — iL 2 (e) > 2 ln 2 ’ 
and using (1271) . we get 


n 

^?({fc}) > E/-(l-E/) - O^ 1 / 3 ) 

k =1 

For a boolean function / : { 0 , l} n —► { 0 , 1 } holds E f 2 = E /, and consequently / 2 (<S') = E /(1— 

E/). Hence we have 

E ^ ° ( aV3 ) 

|S|>2 

Let g = 2 / - 1 . Then 5 : { 0 , l} n ->• {- 1 , 1 }, and E|S|> 2 ? 2 ( S ') = 4 ' E|s|> 2 < 0 (A 1/3 )- 
Note also that (1271) implies 2c — 1 < g(0) = 2Eg — 1 <0. 

Hence, assuming A is sufficiently small, Theorem 15.51 implies that | E g| < O ^A 1 / 3 • y^hr , and 
there exists an index 1 < k < n such that ^ 2 ({/e}) >1 — 0 (A 1//3 ). 

5 Asymptotic notation hides absolute constants independent of the remaining parameters. 
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This means that 


\-Oy < E f <\ and |/({fc})| > \ O (a 1 / 3 ) 

If A is sufficiently small, / satisfies the conditions of Theorem ll.141 By the second claim of this 
theorem, 

Ent{TJ ) + Ent (T e (l — /)) < 1 - H 2 (e), 

completing the proof of Theorem 11.151 

I 

6.4.1 Proof of Lemma 16.11 

The argument below is a slight extension of an argument in Ca¬ 
in the following, we may and will assume, by homogeneity, that E / = 1. 

Let us introduce some notation. For x £ {0, l} n , let x c be the complement of x, that is the 
element of { 0 , 1 }" with x\ = 1 — Xi for all 1 < i < n. 

For a nonnegative function g on {0, l} n , let go be the ’even’ part of g defined by go(x) = (g(x) + g ( x c )) /2, 
and let g\ = g—go be the ’odd’ part of g. By definition, go{x) = go (x c ) and g\(x) = —go{x c ). 

Note also that |gi| < go- 

We will need the following well-known (and easy to verify) fact: 

9o = ^2 9( s ) • W S and g\ = ^ g(S) ■ W s 

|£| even |£| odd 


We start with an auxiliary claim. 

Lemma 6.2: For a function g with E g = I holds 


Ent {g) = E „i(») + E 9 „(x) ■ (l - H, \ 

Here for x such that go(x) = gi(x) = 0, the expression go(x) ■ ^1 — H 2 ^ 1 ~l gl Wl/ 9 °( a! ) i s 
interpreted as 0. 

Proof: We have 

Ent (2) = Eg(x)logg(x) = i • E (g(x) log g(x) + g {x c ) log (x c ) ^ = 
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•f ( (00(2:) +5i(®)) • log (50(3;)+ffi(a;)) + (go(x) - gi(x)^ • log fg 0 (x) - gi(x)j 


It is easy to verify that for any 0 < b < a holds 

1/2 • ^(a + b) log(a + b) + (a — b) log(a — b)^j = a log a + a ■ ^1 — H 2 ( 1 2 ^ ) ) > where the last 
expression should be interpreted as 0 for a = b = 0. 

Using this identity with a = go(x) and b = g\{x) gives the claim of the lemma. | 

Next, as in [13], we upper bound 1 — H 2 by + (l — 217:2) ' ■ 

Substituting this bound in the claim of Lemma 16.21 gives 


Ent( g ) < Ent( % ) + . E + 


9o(x) 


2 In 2 


•E 

X 


9i( x ) 

9o(x) 


Let g = T e f. It is easy to verify (r e f^j 
we get the bound 



for any function / and i 


0,1. Consequently, 


Ent 



< Ent ( T f 


+ 


1 


2 In 2 


•E 


T e fo{x ) 


+ 1 


1 \ (r e /i(x)) 

2 n 2^ * (T e / 0 (x)) 


We upperbound each of the summands on the RHS separately. 


1. The first summand. Note that ET e /o = E/o = E / = 1. Recall also (see e.g., [14] ^ that 
for any function g on {0, l} n holds T e g = Yls ^ S ^ 2 9(S) ' Wg. 

Hence, by Lemma 15.41 

Ent(TJo) < O^\S\-fj 0 2 (S)^j = o(j2\ S \ xlSl fo\s)^j = 

of l^|A |S| ?(5)] = Of E/ 2 -A 2 ) 

|S| even 

2. The second summand. 

First, we argue that T e fo is bounded away from 0 with high probability. Recall that 
ETJo = 1, and note that Varfr t / 0 ) = ( s ) = 0 f \ 2 ■ E/ 2 ). Hence, by 

Chebyshev’s inequality, for any 0 < a < 1 holds 



Second, recall that for any x holds |T e /i(x)| < T e /o(x), and hence < T e fo(x). 
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Therefore, taking a = 1 — A 1 / 3 in (|28j) . we have E x bounded from above by 

Pr{f 0 < 1 — A 1 / 3 j • a + (l + O (A 1 / 3 ) ) •e(t £ / 1 (x)) 2 
Recalling that T e fi = X]|s| odd ^ S ^ 2 f(S) ■ Ws , this is at most 

o(e/ 2 -A 4 / 3 ) + (i + o(a 1/3 )) j^?(W)j-A + o(E/ 2 -A 3 ) 



J2f 2 m)) - a + o(e/ 2 -a 4 / 3 ) 


3. The third summand. Note that E fi = 0. Hence, as in Lemma 1 in |14| (where the 
requirement on / to be boolean does not seem to be necessary) we have, for a sufficiently 
small A, that 

E(TJi(x)) 4 < O^E/ 2 (x)) 2 -A 2 ) = 0(E 2 / 2 -A 2 ) 

We can now upperbound the third summand using the Chebyshev inequality, as above. 
Taking a = 1/2 in (1251) upperbounds E x by 

O (E/ 2 • A 2 ) • a + O (E 2 / 2 • A 2 ) = 0(E/ 2 -A 2 ) + O (E 2 / 2 • A 2 ) 

Combining these estimates leads to the claim of Lemma 16.11 | 
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